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SYNTHETIC LEADER PEPTIDE SEQUENCES 



FIELD OF INVENTION 

The present invention relates to novel synthetic leader peptide sequences for secreting 
polypeptides in yeast. 

BACKGROUND OF THE INVENTION 

Yeast organisms produce a number of proteins which are synthesized intracellularly, 
but which have a function outside the cell. Such extracellular proteins are referred to as 
secreted proteins. These secreted proteins are expressed initially inside the cell in a 
precursor or a pre-protein form containing a pre-peptide sequence ensuring effective 
direction of the expressed product (into the secretory pathway of the cell) across the 
membrane of the endoplasmic reticulum (ER). The pre-sequence, normally named a 
signal peptide, is generally cleaved off from the desired product during translocation. 
Once entered in the secretory pathway, the protein is transported to the Golgi 
apparatus. From the Golgi the protein can follow different routes that lead to 
compartments such as the cell vacuole or the cell membrane, or it can be routed out of 
the cell to be secreted to the external medium (Pfeffer, S.R. and Rothman, J.E. 
Ann.Rev.BIochem. 56(1987) 829-852). 

Several approaches have been suggested for the expression and secretion in yeast of 
proteins heterologous to yeast. European published patent application No. 88 632 
describes a process by which proteins heterologous to yeast are expressed, 
processed and secreted by transforming a yeast organism with an expression vehicle 
harbouring DNA encoding the desired protein and a signal peptide, preparing a culture 
of the transformed organism, growing the culture and recovering the protein from the 
culture medium. The signal peptide may be the signal peptide of the desired protein 
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itself, a heterologous signal peptide or a hybrid of native and heterologous signal 
peptides. 

A problem encountered with the use of signal peptides heterologous to yeast might be 
5 that the heterologous signal peptide does not ensure efficient translocation and/or 
cleavage ot the precursor polypeptide after the signal peptide. 

The Saccharomvces cerevisiae MFa1 (a-factar) is synthesized as a pre-pro form of 
165 amino acids comprising signal- or pre-peptide of 19 amino acids followed by a 
10 "leader" or pro-peptide of 64 amino acids, encompassing three N-linked glycosylation 
sites followed by (LysArg((Asp/Glu)Ala) 2 _3a-factor)^ (Kurjan, J. and Herskowitz, I. CeN 

30 (1982) 933-943). The signal-leader part of the pre-pro MFa1 has been widely 
employed to obtain synthesis and secretion of heterologous proteins in S, cerevisiae . 

15 Use of signal/leader peptides homologous to yeast is known from La. US patent 
specification No. 4,546,082, European published patent applications Nos. 116 201, 
123 294, 123 544, 163 529 and 123 289 and DK patent application No. 3614/83. 

In EP 123 289 utilization of the 5L cerevisiae a-factor precursor is described whereas 
20 WO 84/01 1 53 indicates utilization of the SL cerevisiae invertase signal peptide and DK 
3614/83 utilization of the fL cerevisiae PH05 signal peptide for secretion of foreign 
proteins. 

US patent specification No. 4,546,082, EP 16 201, 123 294, 123 544 and 163 529 
25 describe processes by which the a-factor signal-leader from S. cerevisiae (MFa1 or 
MFa2) is utilized in the secretion process of expressed heterologous proteins in yeast. 
By fusing a DNA sequence encoding the !L cerevisiae MFa1 signal/leader sequence 
at the 5' end of the gene for the desired protein secretion and processing of the desired 
protein was demonstrated. 

30 
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EP 206 783 discloses a system for the secretion of polypeptides from iL cerevisiae 
using an a-factor leader sequence which has been truncated to eliminate the four a- 
factor units present on the native leader sequence so as to ieave the leader peptide 
itself fused to a heterologous polypeptide via the a-factor processing site 
5 LysArgGluAIaGluAla. This construction is indicated to lead to an efficient processing of 
smaller peptides (less than 50 amino acids). For the secretion and processing of larger 
polypeptides, the native a-factor leader sequence has been truncated to leave one or 
two of the a-factor units between the leader peptide and the polypeptide. 

10 A number of secreted proteins are routed so as to be exposed to a proteolytic 
processing system which can cleave the peptide bond at the carboxy end of two 
consecutive basic amino acids. This enzymatic activity is in £L cerevisiae encoded by 
the KEX 2 gene (Julius, DA et al., Cell 37 (1984b) 1075). Processing of the product 
by the KEX 2 protease is needed for the secretion of active cerevisiae mating factor 

15 a1 (MFa1 or a-factor) whereas KEX 2 is not involved in the secretion of active fL 
cerevisiae mating factor a. 

Secretion and correct processing of a polypeptide intended to be secreted is obtained 
in some cases when culturing a yeast organism which is transformed with a vector 

20 constructed as indicated in the references given above. In many cases, however, the 
level of secretion is very low or there is no secretion, or the proteolytic processing may 
be incorrect or incomplete resulting in secretion of a considerable amount of leader 
bound product polypeptide. Prosequences, and especially N-terminally located 
prosequences, or leader sequences expressed in eucaryotic cells, such as yeast cells, 

25 are extensively glycosylated, cf. Fiedler and Simons, Cell, 81, p 309-312; and Moir, 
D.T., Yeast mutants with increased secretion efficiency, in Yeast Genetic Engineering, 
Barr, P. J., Brake, A. J., and Valenzuela, P. eds., wherein a general review of 
glycosylation and secretion of proteins is presented. It is generally recognised that 
glycosyiation, which may be either N-Iinked, O-linked, or both, is important for efficient 

30 transport through the secretory pathway, cf. Caplan et al., Journal of Bacteriology, Vol. 
173, No.2, p. 627-635; and Jars et al., The Journal of Biological Chemistry, Vol. 270, 
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No. 42, p 24810-24817. Moreover, due to the extensive glycosylation the purification of 
secreted propeptides is difficult and differs considerably from the processing steps that 
are typically employed for the purification of the mature secreted polypeptide. 
Clements et al. f Gene, 106 (1991) 267-272, have shown that using a eucaryotic 
5 consensus signal sequence and two 19-aa pro-sequences comprising fractions of the 
a-Factor leader and identical except for the presence or absence of a potential Asn 
iinked (N-linked) glycosylation site for secretion of hEGF from yeast had no effect on 
secretion, and the level of secretion was comparable to the level obtained when using 
the a-Factor prepro-sequence (about 3jxg/ml). 

10 

Expression of heterologous proteins as fusion proteins is a well known concept and 
has been utilized in various contexts in different organisms. Secretory expression of a 
heterologous protein in yeast is often performed as a fusion protein with a secretion 
prepro-Ieader to confer secretion competence. Prepro-leaders tend to be 

15 hyperglycosylated or extensively O-iinked glycosylated in the S. cerecisiae secretory 
pathway. Purification of hyperglycosylated fusion protein is laborious due to its 
heterogeneous nature. Efficient prepro-leaders lacking hyperglycosylation, with no or 
limited O-linked glycosylation and replacement of the dibasic Kex2 endoprotease site 
with a more convenient enzymatic processing site, provide an alternative to 

20 conventional yeast expression by purification of the fusion protein and subsequently in 
vitro maturation with a suitable enzyme as exemplified herein for the insulin precursor. 
In vitro maturation of a purified fusion protein is more flexible since dependency on the 
Kex2 endoprotease is eliminated and any proteolytic enzyme can be used for 
maturation provided that the heterologous protein does not have any internal 

25 processing sites. Purification of the fusion protein from the culture supernatant followed 
by in vitro maturation will avoid N-terminal processing of the heterologous protein by 
dipeptidyl aminopeptidase. Secretion of a fusion protein rather than the heterologous 
protein has the advantage that the propeptide may increase stability and solubility until 
purification and maturation. Secretory expression in yeast of heterologous proteins with 

30 internal dibasic sites may lead to Kex2 endoprotease processing and a decrease in 
fermentation yield. This can be avoided by utilizing a secretion prepro-Ieader lacking N- 
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linked glycosylation to confer secretion competence, introduction of a suitable enzyme 
processing site between the prepro-leader and the heterologous protein, expression in 
a Kex2 endoprotease negative S. cerevisiae strain followed by purification and in vitro 
maturation. 

5 

It is an object of the present invention to provide novel synthetic leader peptides or pro- 
sequences which ensure a higher yield and a more efficient recovery and/or 
processing of polypeptides, preferably secreted polypeptides, including leader bound 
polypeptides, and polypeptides being fused N-terminally to peptide sequences 
10 including leader sequences and/or spacer sequences each of which optionally being 
separated from the other constituent sequences by a processing site, expressed in a 
eucaryotic host cell organism, preferably a fungal cell, such as a yeast cell or a 
filamentous fungus cell. 



15 

SUMMARY OF THE INVENTION 



A novel type of synthetic leader peptide has been found which allows secretion in high 
yield and/or improved recovery of a polypeptide produced in yeast. 

20 

Accordingly, the present invention relates to a DNA construct encoding a polypeptide 
and having the structure SP-LP-(PS)-(S)-(PS)-*gene*, wherein SP is a DNA 
sequence (presequence) encoding a signal peptide, LP is a DNA sequence 
encoding a synthetic leader peptide (propeptide) wherein N-linked glycosylation is 

25 lacking, PS is a DNA sequence encoding a protease processing site which is 
optional, S is a DNA sequence encoding a spacer peptide, and *gene* is a DNA 
sequence encoding a polypeptide. The structure SP-LP-(PS)-(S)-(PSKgene* 
comprises the following structures, SP-LP-PS-S-PS-*gene*, SP-LP-PS-*gene*. SP- 
LP-PS-S-*gene*, SP-LP-S-*gene*, SP-LP-S-PS-*gene*, and SP-LP-*gene*; in 

30 structures containing more than one PS these may be the same or different. 
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Preferably, PS is a DNA sequence encoding a yeast protease processing site, such 
as an endopeptidase processing site, and LS is preferably a DNA sequence encoding 
a synthetic leader peptide or prepro-leader with the general formula !: 

5 Q/SPIDDTESQTTSVNLMADDTESA/RFATYTXLDWN/GL(ISMA)/(PGA)KR (I) 
wherein 

X is a codable amino acid or preferably a sequence of from 1 to 5 codable amino 
acids which may be the same or different, and is preferably selected from the group 
consisting of T.LAV.D.P.H.N.S.G, and Y is a codable amino acid selected from the 
io group consisting of Q and N; the C-terminal KR is an optional dibasic processing 
site. 

More preferably, LS is a DNA sequence encoding a synthetic leader peptide with the 
general formula II: 
15 QPlDD(A/D)E(A/D)Q(A/D)(A/D)( 
VNLI(A/D)MAKR (II) 

wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine (S), 
or aspartic acid (D); the C-terminal KR is an optional dibasic processing site, 
or LS is a DNA sequence encoding a synthetic leader peptide with the general formula 
20 111: 

QPlDD(A/D)E(A/D)Q(A/D)(A/D)( 
VNLI(A/D)MA (III) 

wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine (S), 
or aspartic acid (D). In formulas I and II above, the C-terminal amino acids KR define a 
25 yeast processing site which is optional. 

In the present context, the expression "leader peptide" is understood to indicate a pro- 
peptide sequence whose function is to allow the expressed polypeptide product of 
*gene* optionally fused at its N-terminal to a spacer peptide and/or a sequence of one 
30 or more amino acids defining a processing site, to be directed from the endoplasmic 
reticulum to the Golgi apparatus and further to a secretory vesicle for secretion into the 



WO 98/32867 PCT/DK98/00026 

7 

medium, (i.e. exportation of the expressed polypeptide across the cellular membrane 
and cell wall, if present, or at least through the cellular membrane into the periplasmic 
space of a cell having a cell wall). The term "synthetic" used in connection with leader 
peptides is intended to indicate that the leader peptide is one not found in nature, and, 

5 especially, the leader peptide sequences of the present invention do not include the a- 
factor leader sequence or fragments and constructs thereof such as the sequence 
QPVISTTVGSAAEGSLDKR, and a leader sequence derived from S. cerevisiae 
HSP150 protein having extensive O-iinked glycosylation, cf. Simonen, M., Vihinen, H., 
Jamsa, E., Arumae, U., Kalkkinen, N., and Makarow, M. (1996) The hsp150D-carrier 

10 confers secretion competence to the rat nerve growth factor receptor ectodomain in 
Saccharomyces cerevisiae. Yeast 12, 457-466. Jamsa E ; Holkeri H ; Vihinen H ; 
Wikstrom M ; Simonen M ; Walse B ; Kalkkinen N ; Paakkola J ; and Makarow M 
(1995) Structural features of a polypeptide carrier promoting secretion of a beta- 
lactamase fusion protein in yeast. YEAST 11,1381-91. 

15 

The term "signal peptide" is understood to mean a pre-sequence which is 
predominantly hydrophobic in nature and present as an N-terminal sequence of the 
precursor form of an extracellular protein, preferably when expressed in yeast. The 
function of the signal peptide is to allow the expressed protein to be secreted to enter 
20 the endoplasmic reticulum. The signal peptide is normally cleaved off in the course of 
this process. The signal peptide may be heterologous or homologous to the organism 
producing the protein. 

The expression "polypeptide" is intended to indicate a heterologous polypeptide, i.e. a 
25 polypeptide or protein which is not produced by the host organism, preferably yeast, in 
nature as well as a homologous polypeptide, i.e. a polypeptide which is produced by 
the host organism, preferably a yeast, in nature and any preform thereof. In a preferred 
embodiment, the DNA construct of the present invention encodes a heterologous 
polypeptide. 

30 
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The expression "a codable amino acid" is intended to indicate an amino acid which can 
be coded for by a triplet ("codon") of nucleotides. 

When, in the amino acid sequences given in the present specification, the one or three 
5 letter codes of two amino acids, separated by a slash, are given in brackets, e.g. (D/E), 
this is intended to indicate that the sequence has either the one or the other of these 
amino acids in the pertinent position. 

The expression "heterologous protein" is intended to indicate a protein or polypeptide 
10 which is not produced by the host organism in nature, preferably the protein or 
polypeptide is heterologous in yeast. 

The expression "spacer peptide" is intended to indicate an oligopeptide sequence of 
one or more amino acid residues, preferably 1 to 12 amino acid residues, more 
15 preferably about 4 to 6 amino acid residues, such as EEAEPK, EEGEPK, 

E(EA)3EPK, and EEPK, which may include a processing site, preferably situated N- 
terminally and/or C-terminally. 



20 BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention is further illustrated with reference to the appended drawings 
wherein 

Fig. 1 shows the expression plasmid pAK773 containing genes expressing the N- 
25 terminally extended polypeptides of the invention. In Fig. 1 the following 

symbols are used: TPI-PROMOTER: Denotes the TPI gene promoter 
sequence from S. cerevisiae. 2: Denotes the region encoding a signal/leader 
peptide (e.g. from the YAP3 signal peptide and LA19 leader peptide in 
conjunction with the EEGEPK N-terminally extended Ml3 insulin precursor). 
30 TPi-TERMINATOR: Denotes TPI gene terminator sequence of S. cerevisiae. 

TPI-POMBE: Denotes TPI gene from S. pombe. Origin: Denotes a sequence 
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from S. cerevisiae 2 \i plasmid including its origin of DNA replication in S. 
cerevistae. AMP-R: Sequence from pBR322 /pUC13 including the ampicillin 
resistance gene and an origin of DNA replication in E. coii. 

Fig. 2 shows an example of a DNA sequence pAK855 (SEQ ID No. 1 ) encoding the 
YAP3 signal peptide, a leader without potential N-linked glycosylation sites, the 
TA57 leader, and EEGEPK-M13 insulin precursor complex. 

Fig. 3 shows an example of a DNA sequence (SEQ ID No. 2) encoding the YAP3 

signal peptide, a leader without potential N-linked glycosylation sites, the 
leader TA67, and MI3 insulin precursor without N-terminal!y extension complex. 

Fig. 4 shows the expression plasmid pAK855 containing genes expressing the 
leader sequences of the invention. 

Fig. 5 shows in vitro conversion of LA34/IP fusion protein by Achromobacter lyticus 
lysyl specific protease as a plot of the conversion of LA34/IP fusion protein by 
Sepharose-bound Achromobacter lyticus lysyl specific protease vs. time. A 
curve for a first order reaction with (pseudo-)equilibrium is fitted to the data 
points. 

Fig. 6 shows mass spectrometry of in vitro maturation of purified LA34 prepro-leader 
insulin precursor (MI3) fusion protein by Achromobacter lyticus lysyl specific 
endoprotease. 

DETAILED DISCLOSURE OF THE INVENTION 

Preferred leader sequences of the invention are shown in Table 1 below. 
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Table 1 



Strain 


leader 

Mr* 
IMO. 


Leader sequence 


SEQ !D 

Kin 


yAK744 


LA23 


QPIDDTESQTTSVNLMADDTESRFATQTTLALDWNLISMAKR 


3 


yAK857 


TA54 


QPIDDTESQTTSVNLMADDTESRFATQTPLALDWNLISMAKR 


4 


yAK858 


TA56 


QPIDDTESQTTSVNLMADDTESRFATNTNALDWNLISMAKR 


5 


VAK862 


TA57 


QPIDDTESQTTSVNLMADDTESAFATQTNSGGLDWGL1SMAKR 


6 


vAK861 


TA59 


OPIDDTESOTTSVNLMADDTFSAFATOTTSVGGLDWGL1SMAKR 


7 

/ 




LA64 


OPIDDTFSOTTSVNLMADDTFSRFATOTTL AL DWNLPGAKR 


R 
O 






OPlDnTF^OTT^V/KlL MADDTF^APATOTNI^GGI DWGI PGAKR 


Q 




TA101 


OPIDDTFSOTTSVNLMADDTF^AFATOTNSGGLDWGLISMA 


m 

1 u 




TA67 


OPIDDTFSOITSVMLMADDTESAFATOTTSVGGLDWGLPGAKR 

\tit luu I low i i o v t>* l.ivi/"\uu i l — O/ \i / \ | w l l vJ v ' — ' vj l. l> v v ULr wnr\i\ 


1 1 




TA68 


OPIDDTFSOTTSVNLMADDTFSAFATOTPi A! nWNLtSMAKR 






1 A74 


OPinnTF^nTT^VMI MAnnTF^RFATfYTTI AI nw/Ni ISMA 


17 




TA76 


QPIDDTESQTTSVNLMADDTESRFATQTTLPGAKR 


14 




TA77 


QPIDDTESQTTSVNLMADDTESRALDWNLPGAKR 


15 




TA78 


QPIDDTESQTTSVNLMFATQTTLALDWNLPGAKR 


16 




TA79 


QPIDDTESQADDTESRFATQTTLALDWNLPGAKR 


17 




TA80 


QPTTSVNLMADDTESRFATQTTLALDWNLPGAKR 


18 




TA89 


QPIDDTESQTTSVNLMADDTESAFATQTNSGGLDWGNTTLISMAKR 


19 




TA90 


QPIDDTESQTTSVNLMADDTESAFATQTNSGGLDWGLINTTMAKR 


20 



In the sequences of Table 1 the C-terminal KR defines a dibasic protease processing 
site. 

Further preferred leader sequences of the invention are shown in Table 1a and 1b 
below. 

Table 1a 



Leader 
No. 


Leader Sequence 


SEQ 
ID No. 


TA75 


QPIDD(A/D)E(A/D)Q(A/D)(A/D)(A/D)VNLMADD(A/D)E(A/D)AFA(A/D)Q(A/D)PLAL 
DWNLISMA 


21 


TA75.50 


QPIDDAEAQAAAVNLMADDDEGFAAQAPLALDWNLISMA 


22 


TA75.15 


QPIDDAEAQDDDVNLMADDDGRFADQAPLALDWNLISMA 


23 


TA75.4 


QPIDDAEAQDAAVNLMADDGRLKIRFAAQAPLALDWNLISMA 


24 
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TA75.51 


QPIDDAEDQAAAVNLMADDDEDGFAAQAPLALDWNLiSMA 


25 


TA75.58 


QPIDDAEAQDDDVNLMADDDGRFAAQAPLALDWNLISMA 


26 


TA75.64 


QPIDDAEAODDDVNLMADDDGRFAAQAPLALDWNLISMA 


27 




and any of the above where SMA is replaced by X 1 MA, wherein X 1 may be any 
codable amino acid, preferably hydraphilic amino acids 


28 



Table 1b 



5 



Leader No. 


Leader Sequence 


SEQ ID NO. 


TA91 


QPTTSVNLMADDTESAFATQTNSGGLDWGLISMAKR 


29 


TA92 


QPIDDTESQADDTESAFATQTNSGGLDWGLISMAKR 


30 


TA93 


QPIDDTESQTTSVNLMFATQTNSGGLDWGLISMAKR 


31 


TA94 


QPIDDTESQTTSVNLMADDTESAGGLDWGLISMAKR 


32 


TA95 


QPIDDTESQTTSVNLMADDTESAFATQTNSLISMAKR 


33 


TA96 


QPIDDTESQTTSVNLMADDTESAFATQTNSGGLMAKR 


34 


TA97 


QPIDDTESQTTSVNLMADDTESALISMAKR 


35 


TA98 


QP1DDTESQTTSVNLMLISMAKR 


36 



The heterologous protein or polypeptide produced by the method of the invention 
may be any protein which may advantageously be produced in yeast. Preferred 

10 examples of such proteins are aprotinin, tissue factor pathway inhibitor or other 
protease inhibitors, and insulin or insulin precursors, insulin analogues, insulin-i'ike 
growth factors, such as IGF I and IGF II, human or bovine growth hormone, 
interleukin, tissue plasminogen activator, glucagon, glucagon-iike peptide-1 (GLP 1), 
glucagon-like peptide-2 (GLP 2), GRPP, Factor VII, Factor VIII, Factor XIII, platelet- 

15 derived growth factor, enzymes, such as lipases, or a functional analogue of any one 
of these proteins. More preferred proteins are precursors of insulin and insulin-like 
growth factors, and especially the smaller peptides of the proglucagon family, such 
as glucagon, GLP 1, GLP 2, and GRPP, including truncated forms, such as GLP-1(1- 
45), GLP-1(1-39), GLP-1(1-38), GLP-1(1-37), GLP-1(1-36), GLP-1(1-35), GLP-1(1-34), 
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GLP-1(7-45), GLP-1(7-39). GLP-1(7-38), GLP-1(7-37), GLP-1(7-36) f GLP~1(7-35), and 
GLP-1(7-34). 

In the present context, the term "functional analogue" is meant to indicate a 
5 polypeptide with a similar function as the native protein (this is intended to be 

understood as relating to the nature rather than the level of biological activity of the 
native protein). The polypeptide may be structurally similar to the native protein and 
may be derived from the native protein by addition of one or more amino acids to 
either or both the C- and N-terminai end of the native protein, substitution of one or 
10 more amino acids at one or a number of different sites in the native amino acid 
sequence, deletion of one or more amino acids at either or both ends of the native 
protein or at one or several sites in the amino acid sequence, or insertion of one or 
more amino acids at one or more sites in the native amino acid sequence. Such 
modifications are well known for several of the proteins mentioned above. 

15 

The precursors of insulin, including proinsulin as well as precursors having a truncated 
and/or modified C-peptide or completely lacking a C-peptide, precursors of insulin 
analogues, and insulin related peptides, such as insulin-like growth factors, may be of 
human origin or from other animals and recombinant or semisynthetic sources. The 
20 cDNA used for expression of the precursors of insulin, precursors of insulin analogues, 
or insulin related peptides in the method of the invention include codon optimised 
forms for expression in yeast. 

By "a precursor of insulin" or "a precursor an insulin analogue" is to be understood a 
25 single-chain polypeptide which by one or more subsequent chemical and/or 
enzymatical processes can be converted to a two-chain insulin or insuiin analogue 
molecule having the correct establishment of the three disulphide bridges as found in 
natural human insulin. Preferred insulin precursors are MI1, B(1-29)-A(1-2l); MI3, 
B(1-29)-Ala-Ala-Lys-A(1-21) (as described in e.g. EP 163 529); X14, B(1-27-Asp- 
30 Lys)-Ala-AIa-Lys-A(1-21) (as described in e.g. PCT publication No. 95/00550); B(1- 
27-Asp-Lys)-A(1-21); B(1-27-Asp-Lys)-Ser-Asp-Asp-Ala-Lys-A(1-21); B(1-29)-AIa- 
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AIa-Arg-A(1-21) (as described in e.g. PCT Publication No. 95/07931); MIS, B(1-29)- 
Ser-Asp-Asp-A!a-Lys-A(1-21); and B(1-29)-Ser-Asp-Asp-A!a-Arg-A(1-2l) t and more 
preferably MI1, B(1-29)-A(1-21), MI3, B(1-29)-Ala-Ala-Lys-A(1-21) and MI5, B(1-29)- 
Ser-Asp-Asp-Ala-Lys-A(1 -21 ). 

5 

Examples of insulins or insulin analogues which can be produced in this way are 
human insulin, preferably des(B30) human insulin, porcine insulin; and insulin 
analogues wherein at least one Lys or Arg is present, preferably insulin analogues 
wherein Phe B1 has been deleted, insulin analogues wherein the A-chain and/or the B- 

10 chain have an N-terminal extension and insulin analogues wherein the A-chain and/or 
the B-chain have a C-terminal extension. Other preferred insulin analogues are such 
wherein one or more of the amino acid residues, preferably one, two, or three of them, 
have been substituted by another codable amino acid residue. Thus, in position A21 a 
parent insulin may instead of Asn have an amino acid residue selected from the group 

15 comprising Ala, Gin, Glu, G!y, His, He, Leu, Met, Ser, Thr, Trp, Tyr or Val, in particular 
an amino acid residue selected from the group comprising Gly, Ala, Ser, and Thr. The 
insulin analogues may also be modified by a combination of the changes outlined 
above. Likewise, in position B28 a parent insulin may instead of Pro have an amino 
acid residue selected from the group comprising Asp and Lys, preferably Asp, and in 

20 position B29 a parent insulin may instead of Lys have the amino acid Pro. The 
expression "a codable amino acid residue" as used herein designates an amino acid 
residue which can be coded for by the genetic code, i. e. a triplet ("codon") of 
nucleotides. 

25 The signal sequence (SP) may encode any signal peptide which ensures an effective 
direction of the expressed polypeptide into the secretory pathway of the cell. The signal 
peptide may be a naturally occurring signal peptide or functional parts thereof or it may 
be a synthetic peptide. Suitable signal peptides have been found to be the a-factor 
signal peptide, the signal peptide of mouse salivary amylase, a modified 

30 carboxypeptidase signal peptide, the yeast BAR1 signal peptide or the Humicola 
lanuginosa lipase signal peptide or a derivative thereof. The mouse salivary amylase 
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signal sequence is described by Hagenbuchle, 0. et al., Nature 289 (1981) 643-646. 
The carboxypeptidase signal sequence is described by Vails, LA et aL, Cell 48 (1987) 
887-897. The BAR1 signal peptide is disclosed in WO 87/02670. The yeast aspartic 
protease 3 signal peptide is described in Danish patent application No. 0828/93. 

5 

The yeast processing site encoded by the DNA sequence PS may suitably be any 
paired combination of Lys and Arg, such as LysArg, ArgLys, ArgArg or LysLys which 
permits processing of the polypeptide by the KEX2 protease of Saccharomyces 
cerevisiae or the equivalent protease in other yeast species (Julius, D.A. et aL, Cell 2Z 
10 (1984) 1075). If KEX2 processing is not convenient, e.g. if it would lead to cleavage of 
the polypeptide product, e.g. due to the presence of two consecutive basic amino acids 
internally in the desired product, a processing site for another protease may be 
selected comprising an amino acid combination which is not found in the polypeptide 
product, e.g. the processing site for FX a , HeGluGlyArg (cf. Sambrook, J., Fritsch, E.F. 

15 and Maniatis, T., Molecular Cloning: A Laboratory Manual . Cold Spring Harbor 
Laboratory Press, New York, 1989). 

Two of the preferred DNA constructs encoding leader sequences are incorporated in 
SEQ ID Nos. 1 and 2 as shown in Fig. 2 codon 1078-1209, and Fig. 3 codon 1028- 

20 1206, or suitable modifications thereof. Examples of suitable modifications of the DNA 
sequence are nucleotide substitutions which do not give rise to another amino acid 
sequence of the protein, but which may correspond to the codon usage of the 
organism, preferably a fungal organism, such as a yeast, into which the DNA construct 
is inserted or nucleotide substitutions which do give rise to a different amino acid 

25 sequence and therefore, possibly, a different protein structure. Other examples of 
possible modifications are insertion of one or more codons into the sequence, addition 
of one or more codons at either end of the sequence and deletion of one or more 
codons at either end of or within the sequence. 

30 One aspect of the invention is a recombinant expression vector carrying any one of the 
expression casettes 
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S'-P-SP-LP-CPSMSMPSKgeneMT)^' 

5'-P-SP-LP-PS-*gene*-(T)j-3 

5^P-SP-LP-S-PS-*geneMT)j-3' 

5'-P-SP-LP-PS-S-*gene*-0") r 3' 

5 5'-P-SP-LP-S-*gene*-(T)i-3' 

5'.p_SP-LP-*gene*-(T)i-3' 

5 , -P-SP-LP-PS-S-PS-*gene*-(T)i-3' 

wherein P is a promoter sequence, SP, LP, PS, S, and *gene*. are as defined above, 
10 T is a suitable terminator, e.g. the TPI terminator (cf. Alber, T. and Kawasaki, G., JL 
Mol. Appl. Genet. 1 (1982) 419-434), and i is 1 or 0. The vector may be any vector 
which is capable of replicating in yeast organisms. The promoter may be any DNA 
sequence which shows transcriptional activity in yeast and may be derived from genes 
encoding proteins either homologous or heterologous to yeast. The promoter is 
15 preferably derived from a gene encoding a protein homologous to yeast. Examples of 
suitable promoters for use in yeast host cells are the Saccharomvces cerevisiae 
MFa1, TPI, ADH, PGK promoters, or the yeast plasmid 2m replication genes REP 1-3 
and origin of replication. The vector may also comprise a selectable marker, e.g. the 
Schizosaccharomvces pombe TPI gene as described by Russell, P.R., Gene 4Q_ 
20 (1985)125-130. 

The expression vector of the invention may be any expression vector that is 
conveniently subjected to recombinant DNA procedures, and the choice of vector will 
often depend on the host cell into which the vector is to be introduced. Thus, the 
25 vector may be an autonomously replicating vector, i.e. a vector which exists as an 
extrachromosomal entity, the replication of which is independent of chromosomal 
replication, e.g. a plasmid. Alternatively, the vector may be one which, when 
introduced into a host cell, is integrated into the host cell genome and replicated 
together with the chromosome(s) into which it has been integrated. 

30 
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The methods used to ligate the sequence S^P-SP-LS-PS^gene^T)^ 1 and to insert it 

into suitable yeast vectors containing the information necessary for yeast replication, 
are well known to persons skilled in the art (cf., for instance, Sambrook, J., Fritsch, E.R 
and Maniatis, T M op.cit ). It will be understood that the vector may be constructed either 
5 by first preparing a DNA construct containing the entire sequence 5-P-SP-LS-PS- 
*gene*-(T)j-3 ! and subsequently inserting this fragment into a suitable expression 

vector, or by sequentially inserting DNA fragments into a suitable vector containing 
genetic information for the individual elements (such as the promoter sequence, the 

signal peptide, the leader sequence GlnProlle(Asp/Glu)(Asp/GIu)X 1 (Glu/Asp)X 2 

10 AsnZ(Thr/Ser)X 3 , the processing site, the polypeptide, and, if present, the terminator 
sequence) followed by ligation. 

In a further aspect, the present invention relates to a process for producing a 
polypeptide (or protein) in yeast, the process comprising culturing a yeast cell, which is 

15 capable of expressing said polypeptide and which is transformed with a yeast 
expression vector as described above including a leader peptide sequence of the 
invention, in a suitable medium to obtain expression and secretion of the said 
polypeptide, after which the polypeptide is recovered from the medium. The term 
"culturing" includes fermenting a yeast under laboratory and industrial conditions to 

20 produce the polypeptide of interest. 

Yeasts are fungi of the class Ascomycetes, subclass Hemiascomycetidae. The yeast 
organism used in the method of the invention may be any suitable yeast organism 
which, on cultivation, produces large amounts of the desired polypeptide. Examples of 
25 suitable yeast organisms may be strains of the yeast species Saccharomyces 

cerevisiae, Saccharomyces kluyveri, Saccharomyces uvarum, Schizosaccharomyces 
pombe, Kluyveromyces lactis, Hansenula polymorpha, Pichia pastoris, Pichia 
methanolica, Pichia kluyveri, Yarrowia lipolytica, Candida sp., Candida utilis, Candida 
cacaoi, Geotrichum sp., and Geotrichum fermentans. It is considered obvious for the 
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skilled person in the art to select any other fungal cell, such as cells of the genus 
Aspergillus, as the host organism. 

The transformation of the yeast cells may for instance be effected by protoplast 
5 formation followed by transformation in a manner known per se . The medium used to 
cultivate the cells may be any conventional medium suitable for growing yeast 
organisms. The secreted polypeptide, a significant proportion of which will be present 
in the medium in correctly processed form, may be recovered from the medium by 
conventional procedures including separating the yeast cells from the medium by 
io centrifugation or filtration, precipitating the proteinaceous components of the 
supernatant or filtrate by means of a salt, e.g. ammonium sulphate, followed by 
purification by a variety of chromatographic procedures, e.g. ion exchange 
chromatography, affinity chromatography or the like. 

15 The invention is further described in the following examples which are not to be 
construed as limiting the scope of the invention as claimed. 

EXAMPLES 

20 

Construction of the yeast strain expressing the insulin precursor mediated by leaders 
lacking N-Iinked glycosylation. 

Synthetic genes coding for the leaders without amino acid sequences potential 
25 subjected to attachment of N-linked glycosylation in context with the insulin precursor 
with or without N-terminal extension of N-terminally extention was constructed using 
the Polymerase Chain Reaction (PCR). Oligonucleotides for PCR were synthesised 
using an automatic DNA synthesizer (applied Biosystems model 380A) using 
phosphoramidite chemistry and commercially available reagents (Beaucage, S.L and 
30 Caruthers, M.H., Tetrahedron letters 22 (1981) 1859-1869). The PCR was performed 
using the Pwo DNA or EHF Polymerase (Boehringer Mannheim GmbH, Sandhoefer 
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Strasse 116, Mannheim, Germany) according to the manufacture's instructions and the 
PCR mix was overlayed with 100 ul mineral oil (sigma Chemical CO, St. Louis MO, 
USA) 

5 PCR 

5 (il oligonucleotide (50 pmol) 
5 p] oligonucleotide (50 pmol) 
10^1 1 0X PCR buffer 
10 S^ldNTPmix 

0.5 jxl Pwo or EHF enzyme 

0.5 (il pAK680 plasmid as template (0.2 ug DNA) 

71 fil dest. water 

15 A total of 12 cycles were performed, one cycle was 94 C for 45 sec; 40 C for 1 min; 72 
C for 1.5 min. The PCR mixture was then loaded onto an 2.5% agarose gel and 
electrophoresis was performed using standard techniques ( Sambrook J, Fritsch El 
and Maniatis T, Molecular cloning, Cold Spring Harbour Laboratory press, 1989). The 
resulting DNA fragment was cut out of the agarose gel and isolated by the Gene Clean 

20 kit (Bio 101 inc., PO BOX 2284, La Jolla, CA 92038, USA) according to the 
manufacturer's instructions. 

Certain leader DNA sequences were constructed by overlap PCR reaction as 
described by Horton, R.M, Cai, Z., Ho, S.N. and Pease, L.R.: Gene splicing by overlap 
25 extension: talior-made genes using the polymerase chain reaction. Biotechniques 8 
(1990) 528-535. 

The purified PCR DNA fragment was dissolved in Des. water and restriction 
endonucleases buffer and typically cut with the restriction endonucleases Bglll and 
30 Ncol according to standard techniques (Sambrook J, Fritsch EF and Maniatis T, 
Molecular cloning, Cold Spring Harbour Laboratory press, 1989). The Ncol-Xbal DNA 
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fragment on 209 nucleotide basepars was subjected to agarose electrophoresis and 
purified using The Gene Clean Kit as described. 

The expression plasmid pAK721 or a similar plasmid of the cPOT type (see Fig. 1) was 
5 typically cut with the restriction endonucleases Bglll and Xbal and the vector fragment 
of 10849 nucleotide basepairs isolated using The Gene Clean Kit as described. 

The typically plasmid pAK773 encoding the N-terminaily extended EEGEPK-insulin 
precursor was cut with the restriction endonucleases Ncof and Xbal and the DNA 

10 fragment of 209 nucleotide basepars isolated using The Gene Clean Kit as described. 
The three DNA fragments was ligated together using T4 DNA ligase and standard 
conditions (Sambrook J, Fritsch EF and Maniatis T, Molecular cloning, Cold spring 
Harbour laboratory press, 1989). The ligation mix was then transformed into a 
competent E. coli strain (R-, M+) followed by selection with ampicillin resistance. 

15 Plasmid from the resulting E. coli was isolated using standard techniques (Sambrook J, 
Fritsch EL and Maniatis T, Molecular cloning, Cold spring Harbour laboratory press, 
1989), and checked for insert with appropriate restriction endonucleases i.e. Bglll, 
EcoRl, Nco I and Xbal. The selected plasmid was shown by DNA sequence analysis 
(Sequenase, U.S. Biochemical Corp., USA) to encode the DNA sequence for the 

20 leader-MI3 insulin precursor DNA and the DNA encoding the leader to be inserted 
before the DNA encoding the MI3 insulin precursor DNA. 

An example on a DNA sequence pAK855 (SEQ ID No. 1) encoding the YAP3 signal 
peptide - a leader without potential N-linked glycosylation sites, the TA57 leader, 
25 EEGEPK-MI3 insulin precursor complex are shown in Fig. 2. 

An example on a DNA sequence (SEQ ID No. 2) encoding the YAP3 signal peptide- 
synthetic leader without potential N-linked glycosylation sites, the TA69 leader, Ml3 
insulin precursor without N-terminally extension complex are shown in Fig. 3. 



30 
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The yeast expression plasmids used are of the C-POT type (see Fig. 1 and 4) and are 
similar to those described in WO EP 171 142, which contain the Schizosaccharomyces 
pombe triose phosphate isomerase gene (POT) for piasmid selection and stabilisation 
in S.cerevisiae. pAK855 also contain the S. cerevisiae triose phosphate isomerase 
5 promoter and terminator. The promoter and terminator are similar to those described in 
the piasmid pKFN1003 (described in WO 90/100075) as are all sequences in piasmid 
except the sequence between the EcoRI-Xbal fragment encoding the YAP3 signal 
peptide-leader without N-linked glycosylation-Mi3 insulin precursor with or without N- 
terminaliy extension. 

10 

Purified LA34/IP fusion protein was processed by Sepharose-bound Achromobacter 
lyticus lysyl specific protease (EC 3.4.21 .50) to insulin desB30 (Fig. 5, Fig. 6). From the 
RP-HPLC analysis results the conversion yield for the removal of the LA34 leader from 
IP molecule in each collected sample was calculated and then plotted in a graph 

15 showing the conversion as a function of the reaction time. A curve for a first-order 
reaction reaching a (pseudo-)equilibrium can be fitted to the data points as shown in 
Fig. 5, Fig, 6. Electrospray mass spectrometry was performed on the proteinaceous 
material isolated from the two main peaks eluted by the RP-HPLC fractionation of the 
final reaction mixture. For the first eluting peak was found Mw of 5706 Da, 

20 corresponding to des(B30)-human insulin (calculated Mw: 5706 Da), and for the 
second peak was found a Mw of 5625 Da, corresponding to the di-mannosylated 
LA34-EEAEAEAEPK polypeptide lacking the dipeptide QP (calculated Mw: 5627 Da) 
the QP dipeptide presumably having been removed by the dipeptidyl aminopeptidase 
during secretion. This means that within the reaction time an almost complete cleavage 

25 of the precursor to an active desB30 insulin molecule has taken place. 
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CLAIMS 

1.A DNA construct encoding a polypeptide and having the structure 
SP-LP-(PSHSMPS)~*gene*, 

5 wherein SP is a DNA sequence (presequence) encoding a signal peptide, LP is a 
DNA sequence encoding a synthetic leader peptide (propeptide) wherein N-linked 
glycosylation is lacking, PS is a DNA sequence encoding a protease processing 
site which is optional in both positions, S is a DNA sequence encoding a spacer 
peptide which is optional, and *gene* is a DNA sequence encoding a polypeptide. 

10 2. A DNA construct according to claim 1, and having the structure 
SP-LP-PS-*gene*, 

wherein SP, LP, PS, and *gene* have the meanings defined above. 

3. A DNA construct according to claim 2, which furthermore comprises a sequence 
encoding a spacer peptide located at the 5' end of *gene* and optionally 

15 comprises a sequence encoding a protease processing site located between the 
3* end of the sequence encoding said spacer peptide and the 5* end of said 
*gene* 

4. A DNA construct according to any one of the preceding claims which is 
furthermore characterised in that O-linked glycosylation of LP is lacking. 

20 5. A DNA construct according to any one of claims 1, 2 and 3 which is furthermore 

characterised in LP having O-linked glycosylation. 
6. A DNA construct according to any one of the preceding claims, characterised in 

that LP does not comprise the consensus N-linked glycosylation sites NXT/S, 

wherein X designates any codable amino acid. 
25 7. A DNA construct according to any one of the preceding claims, wherein SP is a 

DNA sequence selected from the group of DNA sequences encoding the S. 

cerevisiae a-factor signal peptide, the signal peptide of mouse salivary amylase, 

the yeast carboxypeptidase signal peptide, the yeast aspartic protease 3 signal 

peptide or the yeast BAR1 signal peptide. 
30 8. A DNA construct according to any one of the preceding claims, wherein LP is a 

DNA sequence encoding a leader peptide with the general formula I: 
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Q/SPIDDTESQTTSVNLMADDTESA/RFATYTXLDWN/GL(ISMA)/(PGA)KR (I) 
wherein 

X is a codable amino acid or preferably a sequence of from 1 to 5 codabie amino 
acids which may be the same or different, and is preferably selected from the 
5 group consisting of T.LAV.D.P.H.N.S.G, and Y is a codable amino acid selected 
from the group consisting of Q and N. 

9. A DNA construct according to claim 8, wherein Y is Q and X does not comprise S 
or T. 

10. A DNA construct according to any one of claims 1 to 7, wherein LS is a DNA 
10 sequence encoding a synthetic leader peptide with the general formula II: 

QPiDD(A/D)E(A/D)Q(A/D)(A^ 

WNLI(A/D)MAKR (il) 
wherein (A/D) can be any codable amino acid, but preferably is alanine (A) or 
aspartic acid (D). 

15 11. A DNA construct according to any one of claims 1 to 7, wherein LS is a DNA 
sequence encoding a synthetic leader peptide with the general formula III: 

QPIDD(A/D)E(A/D)Q(A/D)(A/DX 
DWNLI(A/D)MA(III) 

wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine 
20 (S), or aspartic acid (D). 

12. A DNA construct according to any one of the preceding claims, wherein X is 
selected from the sequences NA, TLA, DLA, PLA, TLAGG, TLADGG, TLADD, 
TLAGD, NSGG, TNSGG, and TSVGG. 

13. A DNA construct according to any one of the preceding claims, wherein the 
25 leader peptide coded for by the DNA sequence LP is selected from the group 

comprising the sequences LA23, TA54, TA56, TA57, TA59, LA64, TA65, TA67, 
TA68, TA76, TA77, TA78, TA79, TA80, TA89, TA90, and TA101 of Table 1 
herein. 

14. A DNA construct according to any one of the preceding claims, wherein the 
30 leader peptide coded for by the DNA sequence LP is selected from the group 
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comprising the sequences TA75, TA75.50, TA75.15, TA75.4, TA75.51, TA75.53, 
and TA75.64 of Table 1a herein. 

15. A DNA construct according to any one of the preceding claims, wherein the 
leader peptide coded for by the DNA sequence LP is selected from the group 
comprising the sequences TA91, TA92, TA93, TA94, TA95, TA96, TA97, and 
TA98, of Table 1b herein. 

16. A DNA construct according to any one of the preceding claims, wherein PS is a 
DNA sequence encoding an endoprotease processing site which allows in vivo 
processing. 

17. A DNA construct according to the preceding claim wherein the processing site is 
selected from DNA sequences encoding a dibasic processing site, preferably 
encoding the amino acid sequences KR, RK, RR, or KK. 

18. A DNA construct according to any one of the preceding claims, wherein PS is a 
DNA sequence encoding an endoprotease processing site which allows in vitro 
processing. 

19. A DNA construct according to the preceding claim wherein the processing site is 
selected from DNA sequences encoding a monobasic or dibasic processing site, 
preferably encoding the amino acid sequences K, R, or KR, RK, RR, or KK. 

20. A DNA construct according to any one of the preceding claims, wherein the 
polypeptide is a polypeptide which is heterologous to yeast 

21. A DNA construct according to the preceding claim, wherein the polypeptide is 
selected from the group consisting of aprotinin, tissue factor pathway inhibitor, or 
other protease inhibitors, insulin or insulin precursors, insulin-like polypeptides, 
such as insulin-like growth factor I and insulin-like growth factor II, human or 
bovine growth hormone, interleukin, glucagon, glucagon-like peptide 1, glucagon- 
iike peptide II, GRPP, tissue plasminogen activator, transforming growth factor a 
or b, platelet-derived growth factor, enzymes, or a functional analogue thereof. 

22. A DNA construct according to claim 18, wherein the polypeptide is selected from 
the group consisting of insulin or insulin precursors, insulin-like polypeptides, such 
as insulin-like growth factor I and insulin-like growth factor II, glucagon, glucagon- 
like peptide 1, glucagon-like peptide II, GRPP, or a functional analogue thereof. 
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23. A DNA construct according to any one of claims 1 to 17, wherein the polypeptide 
is a polypeptide which is homologous to yeast. 

24. A DNA construct according to the preceding claim, wherein the polypeptide is 
selected from the group consisting of the gene products of the KEX2 gene, and 
the YAP3 gene. 

25. A DNA construct according to any one of the preceding claims which furthermore 
comprises a promoter sequence located at the N-terminal end of the structure SP- 
LP-PS-*gene*. 

26. A DNA construct according to any one of the preceding claims which furthermore 
comprises a promoter sequence located at the N-terminal end of the structure SP- 
LP-(PSMS)-(PS)-*gene*. 

27. A DNA construct according to claim 25 and 26, wherein the promoter sequence 
is a yeast promoter sequence, preferably the TPI promoter. 

28. An expression cassette comprising the DNA construct according to claim 25, 
which additionally comprises a 5* terminally located promoter sequence and 
a terminator sequence (T)j located at the 3' terminal of the structure SP-LP-PS- 
*gene*, where i is 0 or 1 . 

29. An expression cassette comprising the DNA construct according to claim 26, 
which additionally comprises a 5' terminally located promoter sequence and 
a terminator sequence (T)j located at the 3' terminal of the structure SP-LP-(PS)- 
(S)-(PS)-*gene* f where i is 0 or 1 . 

30. An expression cassette according to claims 28 and 29, wherein i is 1 and T is a 
DNA sequence encoding the TPI terminator. 

31. A yeast expression vector comprising the DNA construct according to any of the 
preceding claims. 

32. A yeast cell which is capable of expressing a polypeptide and which is 
transformed with a yeast expression vector according to claim 31. 

33. A yeast cell according to claim 32 selected from the group consisting of 
Saccharomyces cerevisiae, Saccharomyces uvae, Saccharomyces kluyveri, 
Schizosaccharomyces pornbe, Sacchoromyces uvarum, Kluyveromyces lactis, 
Hansenula polymorpha, Pichia pastoris, Pichia methanolica, Pichia kluyveri, 
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Yarrowia lipolytica, Candida sp., Candida utiiis, Candida cacaoi, Geotrichum sp. t 
and Geotrichum fermentans. 

34. A process for producing a polypeptide in yeast, the process comprising culturing 
a yeast cell, which is capable of expressing the desired polypeptide and which is 
transformed with a yeast expression vector according to claim 31, in a suitable 
medium to obtain expression and secretion of the polypeptide, after which the 
polypeptide is recovered from the medium. 

35. A process according to the preceding claim, wherein the yeast cell is selected 
from the group consisting of S. cerevisiae, Saccharomyces uvae, Saccharomyces 
kluyveri, Schizosaccharomyces pombe, Sacchoromyces uvarum, Kluyveromyces 
lactis, Hansenula polymorpha, Pichia pastoris, Pichia methanolica, Pichia kluyveri, 
Yarrowia lipolytica, Candida sp., Candida utiiis, Candida cacaoi, Geotrichum sp., 
and Geotrichum fermentans, preferably Saccharomyces cerevisiae. 

36. A DNA sequence encoding a synthetic prepro-leader peptide lacking the 
consensus N-linked glycosylation sites NXT/S, wherein X designates any codable 
amino acid which is not P. 

37. A DNA sequence according to the preceding claim selected from the group 
consisting of 
Q/S P i DDTESQTTS VNLM AD DTES A/RF ATYTXLD WN/G L(l SM A)/ (PG A)KR (I) 
wherein 

X is a codable amino acid or preferably a sequence of from 1 to 5 codable amino 
acids which may be the same or different, and is preferably selected from the 
group consisting of T,LAVAP,H T N f S,G, and Y is a codable amino acid selected 
from the group consisting of Q and N, and wherein the C-terminal KR is an 
optional processing site. 

38. A DNA sequence according to the preceding claim selected from the group 
consisting of LA23, TA54, TA56, TA57, TA59, LA64, TA65, TA67, TA68, TA76, 
TA77, TA78, TA79, TA80, TA89, TA90, and TA101 of Table 1 herein. 

39. A DNA sequence according to claim 36 selected from the group consisting of 
QPIDD(A/D)E(A/D)Q(A/D)(A/D)(A/D)VNLMADD(A/D)E(A/D)AFA(A/D)Q(^ 
WNLI(A/D)MAKR (H) 
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wherein (A/D) can be any codable amino acid t but preferably is alanine (A), serine 
(S), or aspartic acid (D), and wherein the C-terminai KR is an optional processing 
site. 

40. A DNA sequence according to the preceding claim selected from the group 
5 consisting of TA75, TA75.50, TA75.15, TA75.4, TA75.51, TA75.58, and TA75.64 

of Tablela herein. 

41. A DNA sequence according to claim 36 selected from the group consisting of 
QP!DD(A/D)E(A/D)Q(A/D)(A/D)(A/D)VNLMADD(A/D)E(A/D)AFA(A/D)Q(^ 
DWNLI(A/D)MA (ill) 

10 wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine 
(S), or aspartic acid (D). 

42. A DNA sequence according to the preceding claim selected from the group 
consisting of TA91 , TA92, TA93, TA94, TA95, TA96, TA97, and TA98, of Table 
1b herein. 

15 43. A synthetic prepro-ieader peptide lacking the consensus N-linked glycosylation 
sites NXT/S, wherein X designates any codable amino acid which is not P. 

44. A synthetic prepro-ieader peptide according to the preceding claim selected from 
the group consisting of 

Q/SPIDDTESQTTSVNLMADDTESA/RFATYTXLDWN/GL(ISMA)/(PGA)KR (I) 
20 wherein 

X is a codable amino acid or preferably a sequence of from 1 to 5 codable amino 
acids which may be the same or different, and is preferably selected from the 
group consisting of T,L,A,V,D,P,H,N,S,G, and Y is a codable amino acid selected 
from the group consisting of Q and N, and wherein the C-terminal KR is an 
25 optional processing site. 

45. A synthetic prepro-ieader peptide according to the preceding claim selected from 
the group consisting of LA23, TA54, TA56, TA57, TA59, LA64, TA65, TA67, 
TA68, TA76, TA77, TA78. TA79, TA80, TA89, TA90, and TA101 of Table 1 
herein. 

30 46. A synthetic prepro-ieader peptide according to claim 36 selected from the group 
consisting of 
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QP!DD(A/D)E(A/D)Q^ 
WNLI(A/D)MAKR (II) 

wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine 
(S), or aspartic acid (D), and wherein the C-terminal KR is an optional processing 
site. 

47. A a synthetic prepro-leader peptide according to the preceding claim selected 

from the group consisting of TA75, TA75.50, TA75.15, TA75.4, TA75.51, TA75.58, 

and TA75.64 of Tabiela herein. 
43. A synthetic prepro-leader peptide according to claim 36 selected from the group 

consisting of 

QPIDD(A/D)E(A/D)Q(^ 

DVVNLI(A/D)MA (III) 
wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine 
(S), or aspartic acid (D). 

49. A synthetic prepro-leader peptide according to the preceding claim selected from 
the group consisting of TA91 , TA92, TA93, TA94, TA95, TA96, TA97, and TA98, 
of Table 1b herein. 

50. The use of a first DNA sequence encoding a synthetic prepro-leader lacking N- 
linked glycosylation sites for secretion of a protein in fungal cells, such as yeast 
cells. 

51. Use according to the preceding claim wherein said prepro-leader additionally 
lacks O-linked glycosylation sites, 

52. Use according to any of claims 36 to 37, wherein said synthetic prepro-leader 
has an amino acid sequence selected from the group consisting of 
Q/SPIDDTESQTTSVNLMADDTESA/RFATYTXLDWN/GL(ISMA)/(PGA)KR (I) 
wherein 

X is a codable amino acid or preferably a sequence of from 1 to 5 codable amino 
acids which may be the same or different, and is preferably selected from the 
group consisting of T,L,A,V,D,P,H,N,S,G, and Y is a codable amino acid selected 
from the group consisting of Q and N, and wherein the C-terminal KR is an 
optional processing site. 
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53. Use according to the preceding claim wherein said prepro-Ieader is selected from 
the group consisting of LA23, TA54, TA56, TA57, TA59, LA64, TA65, TA67, 
TA68, TA76. TA77, TA78, TA79, TA80, TA89, TA90, and TA101 of Table 1 
herein. 

5 54. Use according to any of claims 36 to 37, wherein said synthetic prepro-Ieader 
has an amino acid sequence selected from the group consisting of 
QPIDD(A/D)E(A/D)Q(A/DX 

WNLI(A/D)MAKR (II) 
wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine 
10 (S), or aspartic acid (D), and wherein the C-terminal KR is an optional processing 
site. 

55. Use according to the preceding claim wherein said prepro-Ieader is selected from 
the group consisting of TA75, TA75.50, TA75.15, TA75.4, TA75.51, TA75.58, and 
TA75.64 of Tablel a herein. 
15 56. Use according to the preceding claim wherein said synthetic prepro-Ieader has 
an amino acid sequence selected from the group consisting of 
QPIDD(A/D)E(A/D)Q(A^ 

DWNLI(A/D)MA (III) 
wherein (A/D) can be any codable amino acid, but preferably is alanine (A), serine 
20 (S), or aspartic acid (D). 

57. Use according to the preceding claim wherein said synthetic prepro-Ieader has 
an amino acid sequence selected from the group consisting of TA91, TA92, TA93, 
TA94, TA95, TA96, TA97, and TA98, of Table 1b herein. 

58. Use according to any of claims 36 to 43, wherein said protein is encoded by a 
25 second DNA sequence fused at the 5' end to said first DNA sequence encoding 

said prepro-Ieader. 

59. Use according to the preceding claim wherein a third DNA sequence encoding a 
spacer peptide optionally having one or more processing sites is inserted in frame 
between the 3' end of said first DNA sequence encoding said prepro-Ieader and 

30 the 5' end of said second DNA sequence encoding said protein. 
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60. Use according to the preceding claim wherein the DNA sequence encoding said 
spacer peptide is selected from DNA sequences encoding an oligopeptide having 
1 to 12 amino acid residues, such as EEAEPK, EEGEPK, E(EA) 3 EPK, EEPK. 

61. Use according to any of claims 36 to 46, wherein said protein is a heterologous 
5 protein. 

62. Use according to the preceding claim wherein said protein is selected from the 
group consisting of aprotinin, tissue factor pathway inhibitor, or other protease 
inhibitors, insulin or insulin precursors, insulin-like polypeptides, such as insulin- 
like growth factor i and insuiin-like growth factor II, human or bovine growth 

10 hormone, interleukin, glucagon, glucagon-like peptide 1, glucagon-like peptide II, 
GRPP, tissue plasminogen activator, transforming growth factor a or b, platelet- 
derived growth factor, enzymes, or a functional analogue thereof. 

63. Use according to any of claims 36 to 48 wherein said protein is insulin or an 
insulin precursor. 

15 64. Use according to any of claims 36 to 46, wherein said protein is a homologous 
protein, preferably selected from the group consisting of the gene products of the 
yeast KEX2 and YAP3 genes. 
65. Use according to any of claims 36 to 50 wherein said yeast is selected from the 
group consisting of S. cerevisiae, Saccharomyces uvae, Saccharomyces kluyveri, 

20 Schizosaccharomyces pombe, Sacchoromyces uvarum, Kluyveromyces lactis, 
Hansenula polymorpha, Pichia pastoris, Pichia methanoiica, Pichia kluyveri, 
Yarrowia lipolytica, Candida sp., Candida utilis, Candida cacaoi, Geotrichum sp., 
and Geotrichum fermentans, preferably Saccharomyces cerevisiae. 
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Fig. 2 

EcoRI 

901 TTCTTGCTTA AATCTATAAC TACAAAAAAC ACATACAGGA ATTCCATTCA 
AAGAACGAAT TTAGATATTG ATGTTTTTTG TGTATGTCCT TAAGGTAAGT 

951 AG&ATAGTTC AAACAAGAAG ATTACAAACT ATCAATTTCA TACACAATAT 
TCTTATCAAG T7TGTTCTTC TAATGTTTGA TAGTTAAAGT ATGTGTTATA 

+1 M.KLKTVRSAVLS 

Bglll 

1001 AAACGATTAA AAGAATGAAA CTGAAAACTG TAAGATCTGC GGTCCTTTCG 
TTTGCTAATT TTCTTACTTT GACTTTTGAC ATTCTAGACG CCAGGAAAGC 

+1 S L FA SQV LGQ PIDD TES 

Styl 

1051 TCACTCTTTG CATCTCAGGT CCTTGGCCAA CCAATTGACG ACACTGAATC 
AGTGAGAfcAC GTAGAGTCCA GGAACCGGTT GGTTAACTGC TGTGACTTAG 

+1 QTT SVNL MAD DTE SAF 
1101 TCAAACTACT TCTGTCAACT TGATGGCTGA CGACACTGAA TCTGCTTTCG 
AGTTTGATGA AGACAGTTGA ACTACCGACT GCTGTGACTT AGACGAAAGC 

+1ATQT MSG GLDV VGL ISM 

Styj. 



Ncol 



1151 CTACTCAAAC TAACTCTGGT GGTTTGGATG TTGTTGGTTT GATCTCCATG 
GATGAGTTTG ATTGAGACCA CCAAACCTAC AACAACCAAA CTAGAGGTAC 

+1AK RE EGE PKF VNQH LCG 
Styl 

Ncol 

1201 GCTAAGAGAG AAGAAGGTGA ACCAAAGTTC GTTAACCAAC ACTTGTGCGG 
CGATTCTCTC TTCTTCCACT TGGTTTCAAG CAATTGGTTG TGAACACGCC 

+1 SHL VEAL Y L V CGE RGF 
Hindlll 



1251 TTCCCACTTG GTTGAAGCTT TGTACTTGGT TTGCGGTGAA AGAGGTTTCT 
AAGGGTGAAC CAACTTCGAA ACATGAACCA AACGCCACTT TCTCCAAAGA 

+1FYTP K A A KGIV EQC CTS 
Bsu36I 



1301 TCTACACTCC TAAGGCTGCT AAGGGTATTG TCGAACAATG CTGTACCTCC 
AGATGTGAGG ATTCCGACGA TTCCCATAAC AGCTTGTTAC GACATGGAGG 

+1 I C S h Y Q L ENY C N * 
1351 ATCTGCTCCT TGTACCAATT GGAAAACTAC TGCAACTAGA CGCAGCCCGC 
TAGACGAGGA ACATGGTTAA CCTTTTGATG ACGTTGATCT GCGTCGGGCG 

Xbal 



14 01 AGGCTCTAGA AACTAAGATT AATATAATTA TATAAAAATA TTATCTTCTT 
TCCGAGATCT TTGATTCTAA TTATATTAAT ATATTTTTAT AATAGAAGAA 
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Fig. 3 



EcoRI 



901 TTCTTGCTTA AATCTATAAC TACAAAAAAC ACATACAGGA ATTCCATTCA 
AAGAACGAAT TTAGATATTG ATGTTTTTTG TGTATGTCCT TAAGGTAAGT 

951 AGAATAGTTC AAACAAGAAG ATTACAAACT ATCAATTTCA TACACAATAT 
TCTTATCAAG TTTGTTCTTC TAATGTTTGA TAGTTAAAGT AT GTGTTATA 

+1 MKLKTVRSAVLS 

Eglll 



1001 AAACGATTAA AAGAATGAAA CTGAAAACTG TAAGATCTGG GGTCCTTTCG 
TTTGCTAATT TTCTTACTTT GACTTTTGAC ATTCTAGACG CCAGGAAAGC 

+1SLFA SQV LGQ P I D D TES 

Styl 



1051 TCACTCTTTG CATCTCAGGT CCTTGGCCAA CCAATTGACG ACACTGAATC 
AGTGAGAAAC GTAGAGTCCA GGAACCGGTT GGTTAACTGC TGTGACTTAG 

+1 QTT SVNL MAD DTE SAF 
1101 TCAAACTACT TCTGTCAACT TGATGGCTGA CGACACTGAA TCTGCTTTCG 
AGTTTGATGA AGACAGTTGA ACTACCGACT GCTGTGACTT AGACGAAAGC 

+1ATQT NSG GLDV VGL PGA 
1151 CTACTCAAAC TAACTCTGGT GGTTTGGATG TTGTTGGTTT GCCAGGTGCT 
GATGAGTTTG ATTGAGACCA CCAAACCTAC AACAACCAAA CGGTCCACGA 

+1KRFV NQH LCG SHLV EAL 

Hindlll 



1201 AAGAGATTCG TTAACCAACA CTTGTGCGGT TCCCACTTGG TTGAAGCTTT 
TTCTCTAAGC AATTGGTTGT GAACACGCCA AGGGTGAACC AACTTCGAAA 

+1 Y L V CGER G F F Y T P KAA 

Bsu36I 



1251 GTACTTGGTT TGCGGTGAAA GAGGTTTCTT CTACACTCCT AAGGCTGCTA 
CATGAACCAA ACGCCACTTT CTCCAAAGAA GATGTGAGGA TTCCGACGAT 

+1KGIV EQC CTSI CSL Y Q L 
1301 AGGGTATTGT CGAACAATGC TGTACCTCCA TCTGCTCCTT GTACCAATTG 
TCCCATAACA GCTTGTTACG ACATGGAGGT AGACGAGGAA CATGGTTAAC 

+1 E N Y C N * 

Xbal 



1351 GAAAACTACT GCAACTAGAC GCAGCCCGCA GGCTCTAGAA ACTAAGATTA 
CTTTTGATGA CGTTGATCTG CGTCGGGCGT CCGAGATCTT TGATTCTAAT 
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